ggml-vulkan: disable transfer queue on UMA by RipleyTom · Pull Request #20441 · ggml-org/llama.cpp

RipleyTom · 2026-03-12T05:31:07Z

Fixes #20439 .

I am unsure why transfer queues add such a huge overhead(it was at least 10GB as tried reserving that much when loading the model and it still choked)

Summary from big C:

Using the separate SDMA/transfer queue for async copies is counterproductive here because:

There's no separate device memory to transfer to/from — it's all unified memory
The SDMA engine and its associated driver structures (kernel buffer objects for command streams, page table entries, cross-queue synchronization state) consume memory from the same pool
The timeline semaphore synchronization between compute and transfer queues adds driver overhead with no benefit — the compute queue can issue buffer copies just as efficiently on UMA
The transfer queue's command pool (transfer_cmd_pool) and its command buffers accumulate during model loading alongside compute_cmd_pool, effectively doubling the command infrastructure

ggml-vulkan: disable transfer queue on UMA

de3591c

RipleyTom requested a review from 0cc4m as a code owner March 12, 2026 05:31

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Mar 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-vulkan: disable transfer queue on UMA#20441

ggml-vulkan: disable transfer queue on UMA#20441
RipleyTom wants to merge 1 commit intoggml-org:masterfrom
RipleyTom:fix_tq_uma

RipleyTom commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RipleyTom commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant